AITopics | video action detection

On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

Neural Information Processing SystemsDec-26-2025, 14:13:19 GMT

This paper explores the impact of occlusions in video action detection. We facilitatethis study by introducing five new benchmark datasets namely O-UCF and O-JHMDB consisting of synthetically controlled static/dynamic occlusions, OVIS-UCF and OVIS-JHMDB consisting of occlusions with realistic motions and Real-OUCF for occlusions in realistic-world scenarios. We formally confirm an intuitiveexpectation: existing models suffer a lot as occlusion severity is increased andexhibit different behaviours when occluders are static vs when they are moving.We discover several intriguing phenomenon emerging in neural nets: 1) transformerscan naturally outperform CNN models which might have even used occlusion as aform of data augmentation during training 2) incorporating symbolic-componentslike capsules to such backbones allows them to bind to occluders never even seenduring training and 3) Islands of agreement (similar to the ones hypothesized inHinton et Al's GLOM) can emerge in realistic images/videos without instance-levelsupervision, distillation or contrastive-based objectives(eg.

benchmark dataset and training recipe, occlusion, video action detection, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Are all Frames Equal? Active Sparse Labeling for Video Action Detection

Neural Information Processing SystemsDec-24-2025, 06:53:01 GMT

Video action detection requires annotations at every frame, which drastically increases the labeling cost. In this work, we focus on efficient labeling of videos for action detection to minimize this cost. We propose active sparse labeling (ASL), a novel active learning strategy for video action detection. Sparse labeling will reduce the annotation cost but poses two main challenges; 1) how to estimate the utility of annotating a single frame for action detection as detection is performed at video level?, and 2) how these sparse labels can be used for action detection which require annotations on all the frames? This work attempts to address these challenges within a simple active learning framework. For the first challenge, we propose a novel frame-level scoring mechanism aimed at selecting most informative frames in a video. Next, we introduce a novel loss formulation which enables training of action detection model with these sparsely selected frames. We evaluate the proposed approach on two different action detection benchmark datasets, UCF-101-24 and J-HMDB-21, and observed that active sparse labeling can be very effective in saving annotation costs. We demonstrate that the proposed approach performs better than random selection, outperforming all other baselines, with performance comparable to supervised approach using merely 10% annotations.

action detection, active sparse, frame equal, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

5c81ea77a383cc2848d721224717fa4b-Paper-Conference.pdf

Neural Information Processing SystemsAug-15-2025, 03:13:56 GMT

action detection, annotation, detection, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.67)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

Neural Information Processing SystemsJan-19-2025, 19:57:32 GMT

This paper explores the impact of occlusions in video action detection. We facilitatethis study by introducing five new benchmark datasets namely O-UCF and O-JHMDB consisting of synthetically controlled static/dynamic occlusions, OVIS-UCF and OVIS-JHMDB consisting of occlusions with realistic motions and Real-OUCF for occlusions in realistic-world scenarios. We formally confirm an intuitiveexpectation: existing models suffer a lot as occlusion severity is increased andexhibit different behaviours when occluders are static vs when they are moving.We discover several intriguing phenomenon emerging in neural nets: 1) transformerscan naturally outperform CNN models which might have even used occlusion as aform of data augmentation during training 2) incorporating symbolic-componentslike capsules to such backbones allows them to bind to occluders never even seenduring training and 3) Islands of agreement (similar to the ones hypothesized inHinton et Al's GLOM) can emerge in realistic images/videos without instance-levelsupervision, distillation or contrastive-based objectives(eg.

benchmark dataset and training recipe, occlusion, video action detection, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Are all Frames Equal? Active Sparse Labeling for Video Action Detection

Neural Information Processing SystemsOct-11-2024, 05:54:03 GMT

Video action detection requires annotations at every frame, which drastically increases the labeling cost. In this work, we focus on efficient labeling of videos for action detection to minimize this cost. We propose active sparse labeling (ASL), a novel active learning strategy for video action detection. Sparse labeling will reduce the annotation cost but poses two main challenges; 1) how to estimate the utility of annotating a single frame for action detection as detection is performed at video level?, and 2) how these sparse labels can be used for action detection which require annotations on all the frames? This work attempts to address these challenges within a simple active learning framework.

action detection, active sparse, video action detection, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

Filters

Collaborating Authors

video action detection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

Are all Frames Equal? Active Sparse Labeling for Video Action Detection

5c81ea77a383cc2848d721224717fa4b-Paper-Conference.pdf

On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes

Are all Frames Equal? Active Sparse Labeling for Video Action Detection